\section{Optimization Evaluation}
 \begin{enumerate}
 \item we evaluate different versions of SERIMI to show their time efficiency. 
 \item we can show how it performs (time and F1) when we reduce the number of sets. 
 \item we can show how it perform if we exclude any of the defined features.
 \item we can show how it perform when only direct comparison, only class-based disambiguation or both are used.
\end{enumerate}

In this section, we evaluate different versions of SERIMI. We evaluate 4 scenarios: 1) We evaluate SERIMI's performance with and without Alg. \ref{alg:candidateset}(referred as  S and S+SR, respectively); 2) we evaluate SERIMI's performance when features described in Sec. x are removed, i.e., when no predicates are used (S+SR+NP), when no datatype properties are used (S+SR+ND), when no object properties are used(S+SR+NO), when no tuples are used(S+SR+NT); 3) we evaluate SERIMI's performance when TOP-1 (S+SR+TOP1) approach is used and when the Threshold (S+SR+Threshold) approach is used to select correct matches; 4) we evaluate the direct match performance alone (DM), and the SERIMI's performance combined with the direct match (S+SR+DM). Our aim is to measure its time efficiency and its effective in finding correct matches, in each case.  We use the data collections and metrics proposed in the Ontology Alignment Evaluation Initiative (OAEI) benchmark. 

\subsection{Experiment Setting}
\textbf{Data Collections.} We used the data collection of the OAEI 2010 and 2011 in our evaluations. From the OAEI 2010 collections, the life science (LS) collection (which includes DBPedia,
%\footnote{http://dbpedia.org/About}  
Sider,
%\footnote{http://www4.wiwiss.fu-berlin.de/sider/}  
Drugbank,
%\footnote{http://www4.wiwiss.fu-berlin.de/drugbank/}  
LinkedCT,%\footnote{http://data.linkedct.org/}  
 Dailymed, 
%\footnote{http://www4.wiwiss.fu-berlin.de/dailymed/} 
% 
TCM,
%\footnote{http://code.google.com/p/junsbriefcase/wiki/RDFTCMData} 
and Diseasome) 
%\footnote{http://www4.wiwiss.fu-berlin.de/diseasome/} 
and the Person-Restaurant (PR) collection as used. The task in this benchmark was to match entities from each of this datasets to another. All cases evaluated by other participants in this challenge are evaluated here. From the 2011 data, the New York Times (NYT) collection was used. The task in this benchmark was to match NYT entities to DBPedia, Geonames and Freebase. All this cases are evaluated here. 

\textbf{Evaluation metrics} We used precision, recall and F1, as propose by OAEI benchmark, to measure the effectiveness of the proposed approach. We considered as true positives the provided reference mapping (the ground truth). False positives are the mappings found by SERIMI that do not exist in the ground truth. 
%to enable a direct and fair 
%for comparison. 
%As discussed, these two preliminary works are the main solutions for effective instance matching in this heterogeneous setting.  
%In addition, we investigated how SERIMI performs using different settings for the parameters $\delta$ and $k$. 
%This way, we assess how these parameters affect the overall performance of SERIMI. 
\subsection{Experiment Results}
Fig. \ref{fig:f1-exp1} and Fig. \ref{fig:time-exp1} shows the aggregate F1 and time performance measure of each case that we evaluated. 

\textbf{Time efficiency.} We observed (Fig. \ref{fig:time-exp1} and Table \ref{timetable}) that when we use the Alg.\ref{alg:candidateset}, SERIMI is 3x faster than when it is not used (S vs. S+SR), because the number of sets used in the class-based match is considerably reduced in S+SR, and consequently the number of comparisons. We can also observe that all features are relevant to SERIMI, and the removal of any of them does not make SERIMI much faster but decrease its F1 (compare S+SR with NP,ND,NO,NT). In average, the DM (19.87s) approach is 8.6x and 2.7x faster than S (161.06s) and S+SR(53.78s), respectively. However, the best average F1 (91\%) performance is reached with S+ SR+DM (59.85s), which although 3x slower than the DM alone, it is significantly more effective (F1) in a few cases than the DM (e.g. 82\% F1 vs. 67\% F1 in the NYTIMES-DBPEDIA-GEO case).

\textbf{Match effectivity.} The S+SR+TOP-1 has better overall performance (86\% F1) compared to S+SR+Threshold performance (84\% F1); specially, in datasets where 1-to-1 mapping could be enforced. The S+SR+TOP-1 has poor performance in two cases (Person21-Person22 56\% F1 and Sider-Dailymed 56\% F1) where there are 1-to-many mappings between the datasets; in those case, the S+SR+Threshold approach performed better (Person21-Person22 86\% F1 and Sider-Dailymed 82\% F1). SERIMI has its best average F1 performance (92\% F1) when both direct match and class-based match where used together (S+SR+DM). This gain in performance happened because the direct match could reinforce the similarity between instances when a overlap between the source and target existed. There were no case where direct match only performed poorly, because all pair of datasets overlap at least in their keys. However, it does not mean that the overlapping in the keys are enough to guarantee high F1 in all cases, as shown in the NYTIMES-DBPEDIA-GEO case, where DM got 67\% F1 and S+SR 80\% F1. 

Concluding, the results show that the best configuration is to combine SERIMI with direct match to reach the highest F1, and to use the Alg.\ref{alg:candidateset} to reduce the number of comparison made by SERIMI during the computation of the scores.

In the rest of the evaluations of this paper we use S+SR+DM; and the TOP-1 was used when the 1-to-1 mapping was identified, otherwise we use Threshold approach. 

\begin{figure}[]
\centering
\includegraphics[width=0.40\textwidth]{f1.pdf}
\caption{Average F1 performance for different SERIMI configurations.} 
\label{fig:f1-exp1}
\end{figure} 

\begin{figure}[]

\centering
\includegraphics[width=0.40\textwidth]{time.pdf}
\caption{Average Time performance for different SERIMI configurations.} 
\label{fig:time-exp1}
\end{figure} 

\begin{center}
\begin{table*}
\label{tablef1s}
\centering
\caption{F1 Performance for Different SERIMI Configurations} 
\scriptsize\tt
\begin{tabular}{ | c | c | c | c | c | c | c | c | c | c | c | } 
\hline
Datasets & S & S+SR & S+SR+DM & DM & S+SR(Threshold) & S+SR(Top-1) & S+SR+NP & S+SR+ND & S+SR+NO & S+SR+NT \\
\hline
 DAILYMED-SIDER & 1.0 & 1.0 & 1.0 & 1.0 & 0.99 & 1.0 & 1.0 & 1.0 & 1.0 & 1.0 \\
DISEASOME-SIDER & 0.97 & 0.97 & 0.97 & 0.97 & 0.97 & 0.97 & 0.97 & 0.97 & 0.97 & 0.97 \\
DRUGBANK-SIDER & 1.0 & 1.0 & 1.0 & 1.0 & 0.99 & 1.0 & 1.0 & 1.0 & 1.0 & 1.0 \\
NYTIMES-DBPEDIA-CORP & 0.88 & 0.88 & 0.91 & 0.83 & 0.78 & 0.88 & 0.87 & 0.88 & 0.88 & 0.88 \\
NYTIMES-DBPEDIA-GEO & 0.81 & 0.8 & 0.82 & 0.67 & 0.36 & 0.8 & 0.8 & 0.81 & 0.81 & 0.81 \\
NYTIMES-DBPEDIA-PER & 0.95 & 0.95 & 0.95 & 0.93 & 0.91 & 0.95 & 0.95 & 0.95 & 0.95 & 0.95 \\
NYTIMES-FREEBASE-CORP & 0.9 & 0.9 & 0.9 & 0.9 & 0.9 & 0.9 & 0.9 & 0.9 & 0.9 & 0.9 \\
NYTIMES-FREEBASE-GEO & 0.89 & 0.89 & 0.89 & 0.89 & 0.88 & 0.89 & 0.88 & 0.89 & 0.89 & 0.89 \\
NYTIMES-FREEBASE-PER & 0.95 & 0.95 & 0.95 & 0.95 & 0.94 & 0.95 & 0.95 & 0.95 & 0.95 & 0.95 \\
NYTIMES-GEONAMES & 0.79 & 0.79 & 0.87 & 0.87 & 0.35 & 0.79 & 0.65 & 0.78 & 0.79 & 0.78 \\
PERSON11-PERSON12 & 0.53 & 0.48 & 0.95 & 0.97 & 0.49 & 0.48 & 0.48 & 0.49 & 0.47 & 0.48 \\
PERSON21-PERSON22 & 0.87 & 0.86 & 0.91 & 0.91 & 0.86 & 0.51 & 0.86 & 0.86 & 0.86 & 0.86 \\
RESTAURANT1-RESTAURANT2 & 0.97 & 0.96 & 0.97 & 0.97 & 0.94 & 0.96 & 0.96 & 0.96 & 0.96 & 0.96 \\
SIDER-DAILYMED & 0.85 & 0.82 & 0.74 & 0.72 & 0.82 & 0.56 & 0.73 & 0.82 & 0.82 & 0.82 \\
SIDER-DBPEDIA-DRUGS & 0.94 & 0.94 & 0.94 & 0.94 & 0.93 & 0.94 & 0.94 & 0.94 & 0.94 & 0.94 \\
SIDER-DBPEDIA-SIDEEFFECT & 0.9 & 0.9 & 0.89 & 0.89 & 0.89 & 0.9 & 0.9 & 0.9 & 0.9 & 0.9 \\
SIDER-DISEASOME & 0.91 & 0.91 & 0.89 & 0.88 & 0.92 & 0.91 & 0.91 & 0.91 & 0.91 & 0.91 \\
SIDER-DRUGBANK & 0.97 & 0.97 & 0.98 & 0.98 & 0.96 & 0.97 & 0.97 & 0.97 & 0.97 & 0.97 \\
SIDER-TCM & 0.99 & 0.99 & 0.99 & 0.99 & 0.99 & 0.99 & 0.99 & 0.99 & 0.99 & 0.99 \\
\hline
AVERAGE & 0.90 &	0.89 &		0.92	 &	0.91 &		0.84 &		0.86 &		0.88 &		0.89	 &	0.89	 &	0.89 \\
\hline 
\end{tabular}  
\end{table*}  
\end{center}

\begin{center}
\begin{table*}
\label{timetable}
\centering
\caption{Time Performance for Different SERIMI Configurations}  
\scriptsize\tt
\begin{tabular}{ | c | c | c | c | c | c | c | c | c | c | c | } 
\hline
Datasets & S & S+SR & S+SR+DM & DM & S+SR(Threshold) & S+SR(Top-1) & S+SR+NP & S+SR+ND & S+SR+NO & S+SR+NT \\
\hline
DAILYMED-SIDER & 44.96 & 15.95 & 20.85 & 17.11 & 14.58 & 14.64 & 21.62 & 14.34 & 14.04 & 18.04 \\
DISEASOME-SIDER & 2.13 & 1.39 & 1.46 & 1.4 & 1.33 & 1.39 & 2.39 & 1.37 & 1.37 & 1.7 \\
DRUGBANK-SIDER & 7.93 & 7.03 & 8.89 & 8.14 & 7.73 & 7.59 & 10.59 & 7.45 & 7.65 & 9.06 \\
NYTIMES-DBPEDIA-CORP & 128.51 & 72.18 & 75.81 & 19.15 & 70.9 & 69.67 & 69.91 & 66.16 & 66.8 & 71.28 \\
NYTIMES-DBPEDIA-GEO & 1358.09 & 477.49 & 518.16 & 80.18 & 454.18 & 481.11 & 466.89 & 474.96 & 477.54 & 482.92 \\
NYTIMES-DBPEDIA-PER & 634.72 & 202.38 & 235.5 & 95.04 & 202.44 & 210.33 & 189.34 & 205.77 & 203.76 & 197.65 \\
NYTIMES-FREEBASE-CORP & 116.64 & 33.91 & 39.21 & 25.76 & 32.31 & 35.93 & 28.36 & 32.13 & 32.15 & 36.08 \\
NYTIMES-FREEBASE-GEO & 50.14 & 33.93 & 32.08 & 19.07 & 26.22 & 25.1 & 22.92 & 25.43 & 25.65 & 24.6 \\
NYTIMES-FREEBASE-PER & 345.25 & 86.55 & 99.06 & 58.57 & 83.66 & 81.01 & 74.85 & 85.26 & 79.95 & 79.66 \\
NYTIMES-GEONAMES & 173.0 & 45.44 & 51.68 & 16.37 & 43.76 & 46.14 & 32.95 & 42.55 & 43.39 & 34.26 \\
PERSON11-PERSON12 & 6.35 & 1.99 & 2.39 & 1.39 & 1.98 & 1.96 & 1.8 & 1.89 & 1.92 & 1.77 \\
PERSON21-PERSON22 & 2.84 & 2.53 & 2.77 & 0.45 & 2.49 & 2.5 & 2.02 & 2.16 & 2.25 & 1.65 \\
RESTAURANT1-RESTAURANT2 & 0.35 & 0.27 & 0.31 & 0.15 & 0.26 & 0.28 & 0.24 & 0.26 & 0.24 & 0.23 \\
SIDER-DAILYMED & 67.51 & 12.34 & 15.66 & 9.94 & 11.83 & 11.43 & 12.29 & 11.51 & 11.06 & 10.37 \\
SIDER-DBPEDIA-DRUGS & 16.32 & 7.44 & 8.66 & 8.66 & 8.57 & 7.64 & 8.3 & 7.41 & 7.5 & 7.27 \\
SIDER-DBPEDIA-SIDEEFFECT & 16.0 & 2.89 & 3.56 & 2.58 & 2.75 & 2.71 & 2.74 & 4.28 & 2.76 & 2.64 \\
SIDER-DISEASOME & 1.26 & 0.5 & 0.68 & 0.54 & 0.54 & 0.53 & 0.5 & 1.3 & 0.53 & 0.5 \\
SIDER-DRUGBANK & 87.48 & 17.44 & 20.18 & 12.94 & 16.07 & 17.14 & 16.63 & 17.78 & 15.86 & 14.44 \\
SIDER-TCM & 0.58 & 0.17 & 0.19 & 0.18 & 0.14 & 0.15 & 0.18 & 0.17 & 0.14 & 0.14 \\
\hline
AVERAGE & 161.06 &	53.78 &		59.85 &		19.87 &		51.67 &		53.54 &		50.76 &		52.75 &		52.35	 &	52.33 \\
\hline 
\end{tabular}  
\end{table*}  
\end{center}